CPU and GPU Performance

Author

Leif Harvey

Abstract

<brief introduction> <report major findings here>

Introduction

The project will examine performance of CPU and GPU chips made since 2000. The data set includes chips made by multiple vendors such as Intel, NVIDIA, ATI, and AMD. Other variables include the process size, thermal design power, die size, number of transistors, frequency, foundry, and GFLOPS. GFLOPS is a way to compare the performance of graphics cards and stands for a billion floating point operations per second.

chips_origional |> slice(1:5) |> select(2:5, 9:11)
# A tibble: 5 × 7
  Product   Type  `Release Date` `Process Size (nm)` `Freq (MHz)` Foundry Vendor
  <chr>     <chr> <chr>                        <dbl>        <dbl> <chr>   <chr> 
1 AMD Athl… CPU   2007-02-20                      65         2200 Unknown AMD   
2 AMD Athl… CPU   2018-09-06                      14         3200 Unknown AMD   
3 Intel Co… CPU   2020-09-02                      10         2600 Intel   Intel 
4 Intel Xe… CPU   2013-09-01                      22         1800 Intel   Intel 
5 AMD Phen… CPU   2011-05-03                      45         3700 Unknown AMD   

Feature Exploration

Chip Type

The data set is split roughly even with \(2192\) CPUs and \(2662\) GPUs.

chips_origional |> group_by(Type) |> summarise(Count = n())
# A tibble: 2 × 2
  Type  Count
  <chr> <int>
1 CPU    2192
2 GPU    2662

Foundries

The data set contains chips made from nine different foundries, with TSMC and Intel making the vast majority of the chips. \(866\) chips didn’t have a foundry listed.

chips_origional |> group_by(Foundry) |> summarise(Count = n()) |> arrange(desc(Count))
# A tibble: 10 × 2
   Foundry Count
   <chr>   <int>
 1 TSMC     2178
 2 Intel    1390
 3 Unknown   866
 4 GF        265
 5 UMC        79
 6 Samsung    60
 7 Sony       10
 8 IBM         3
 9 NEC         2
10 Renesas     1

Vendors

The data set contains chips from four vendors: AMD, Intel, NVIDIA, and ATI. Only \(64\) of the chips were made from a different, unspecified vendor.

chips_origional |> group_by(Vendor) |> summarise(Count = n()) |> arrange(desc(Count))
# A tibble: 5 × 2
  Vendor Count
  <chr>  <int>
1 AMD     1662
2 Intel   1392
3 NVIDIA  1201
4 ATI      535
5 Other     64

Transistors

The average number of transistors in a chip is \(1929.922\) million, but the median number is \(624\) million. The range is from \(8\) million up to \(54.2\) billion. The graph shows an extreme right skew.

ggplot(data = chips, aes(x = transistors)) + 
  geom_histogram(fill = "lightblue", color = "darkblue") +
  theme_minimal() + 
  ylab("Number of Processors") + 
  xlab("Transistors (millions)")

no_na_t <- chips |> select(transistors) |> filter(!is.na(transistors))
no_na_t |> tidy()
# A tibble: 1 × 13
  column     n  mean    sd median trimmed   mad   min   max range  skew kurtosis
  <chr>  <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
1 trans…  4143 1930. 4045.    624   1060.   546     8 54200 54192  6.13     61.7
# ℹ 1 more variable: se <dbl>

Frequency

The average processor frequency is \(1484.406\) MHz, and the median frequency is \(1073.5\) MHz. The range is from \(100\) MHz up to \(4700\) MHz. The graph shows a right skew in the data with a peak at around \(500\) MHz.

ggplot(data = chips, aes(x = freq)) + 
  geom_histogram(fill = "lightblue", color = "darkblue") +
  theme_minimal() + 
  ylab("Number of Processors") + 
  xlab("Frequency (MHz)")

no_na_f <- chips |> select(freq) |> filter(!is.na(freq))
no_na_f |> tidy()
# A tibble: 1 × 13
  column     n  mean    sd median trimmed   mad   min   max range  skew kurtosis
  <chr>  <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
1 freq    4854 1484. 1067.  1074.   1385.  674.   100  4700  4600 0.646     2.16
# ℹ 1 more variable: se <dbl>

Processor Size

The average processor size is \(55.1\) nm, and the median size is \(40\) nm. The range is from \(0\) nm up to \(250\) nm. The graph shows a right skew in the data with a peak at around \(25\) nm.

ggplot(data = chips, aes(x = process_size)) + 
  geom_histogram(fill = "lightblue", color = "darkblue") +
  theme_minimal() + 
  ylab("Number of Processors") + 
  xlab("Processor Size (nm)")

no_na_p <- chips |> select(process_size) |> filter(!is.na(process_size))
no_na_p |> tidy()
# A tibble: 1 × 13
  column     n  mean    sd median trimmed   mad   min   max range  skew kurtosis
  <chr>  <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
1 proce…  4845  55.1  45.0     40    48.7    25     0   250   250  1.20     3.75
# ℹ 1 more variable: se <dbl>

This project will focus on determining which foundries and vendors

Questions of Interest:

Are some foundries clearly producing better performing products than others?

Can we see Moore’s Law in the data?

What variables seem to play a role in a high fp32gflop?

Can we create a model to predict GFLOPS?

Is there a Best Foundry?

One potential issue when looking a foundries and performance is that the foundries have partnerships with certain vendors, which may be producing better or worse products. Lets go ahead and look anyway with this in mind. The plot is interactive and shows the vendor when the points are hovered over.

gpu <- chips |> filter(!is.na(fp32gflops))

foundry_glops <- ggplot(data = gpu, aes(x = Foundry, y = fp32gflops, label = Vendor)) + 
  geom_jitter(alpha = 0.5) + 
  theme_minimal() + 
  ylab("FP32GFLOPS") + 
  labs(title = "Foundry vs FP-32-GFLOPS",
       caption = "GFLOPS represents Billions of Floating Point Operations Per Second")

ggplotly(foundry_glops, tooltip = "label")

It appears as Samsung is a proxy for NVIDIA as it mainly produces NVIDIA GPUs. TSMC produces products for everyone but Intel, but I’d say Intel mainly focuses on CPUs and many of the GPUs are just integrated ones. GF also has some high performing chips, which are all AMD chips.

Moore’s Law

Co-founder of Fairchild Semiconductor and Co-founder and CEO of Intel Gordon Moore made the observation that the number of transistors in an integrated circuit doubled about every 2 years. In 1965, Moore predicted a doubling every year for at least a decade, and in 1975, Moore changed his prediction to every two years which has held since then. We can look at the number of transistors over the time period of this data set which has chips since 2000.

foundry_glops <- ggplot(data = chips, aes(x = date, y = transistors, label = date)) + 
  geom_point(alpha = 0.5) + 
  theme_minimal() + 
  ylab("Transistors") + 
  labs(title = "Transistors over Time")

ggplotly(foundry_glops, tooltip = "label")

The plot shows an exponential trend which indeed lines up with Moore’s prediction. The graph shows that we have really just hit the exponential curve k

FP 32 GFLOPS

no_na_g <- chips |> filter(!is.na(fp32gflops))

ggplot(data = no_na_g, aes(x = transistors, y = fp32gflops)) + 
  geom_point() + 
  theme_minimal() + 
  ylab("Billion Floating Point Operation per Second") + 
  xlab("Transistors")
Warning: Removed 137 rows containing missing values or values outside the scale range
(`geom_point()`).

ggplot(data = no_na_g, aes(x = die_size, y = fp32gflops)) + geom_point() + theme_minimal()
Warning: Removed 115 rows containing missing values or values outside the scale range
(`geom_point()`).

ggplot(data = no_na_g, aes(x = process_size, y = fp32gflops)) + geom_point() + theme_minimal()
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_point()`).

#ggplot(data = no_na_g, aes(x = Foundry, y = fp32gflops)) + geom_point() + theme_minimal()

#ggplot(data = no_na_g, aes(x = Vendor, y = fp32gflops)) + geom_point() + theme_minimal()

ggplot(data = no_na_g, aes(x = TDP, y = fp32gflops)) + geom_point() + theme_minimal()
Warning: Removed 203 rows containing missing values or values outside the scale range
(`geom_point()`).

ggplot(data = no_na_g, aes(x = freq, y = fp32gflops)) + geom_point() + theme_minimal()

ggplot(data = no_na_g, aes(x = date, y = fp32gflops)) + geom_point() + theme_minimal()

Bibliography

https://en.wikipedia.org/wiki/Moore%27s_law